Photo by CDC on Unsplash unsplash-logo

CDC

Disclaimer: The purpose of the Johns Hopkins IDD COVIDScenarioPipeline project is to provide tools for analysis of COVID-19 related data. These materials do not cover all aspects of the research process. We highly suggest that you seek external consultation from scientific experts regarding your data and the interpreation of your data.

This tutorial assumes that users have knowlege of R programming and limited command line experience. It does not require previous knowlege of GitHub. The tutorial however should be doable by someone without R programming or command line experience.

New to GitHub

If you already have a GitHub account, you can skip this section and move onto the Getting Started section.

What is GitHub?

GitHub is a site that allows users to host and manage code and data files. Thus, you can store your code on the web so that you and others can easily access it (and so that is safe if something happens to your computer!).

It is especailly useful for what is called version control which allows you to track changes to documents overtime.

So although it is intended for version control of code, you can actually use GitHub for version control of many types of documents.

Why do I need an account?

By signing up for an account you can easily access up-to-date files and code for the COVIDScenarioPipeline to allow you to easily run the pipeline on your data.

Better yet, if you learn more about GitHub, you can also use your account to save the files and code for your analysis and track changes over time. You can share your analysis privately with just your team or you can even make it public for others to use.

To learn more about GitHub see here.

Create a GitHub Account:

  1. Click this link

You will see a page that looks something like this:

  1. Fill out a username (any name that works for you), email, and password
  2. Click the green “sign up for GitHub” button

Getting Started

Making a pipeline repository

First navigate to the Johns Hopkins IDD COVIDScenarioPipeline github repository: by clicking here.

You will see a page that looks like this:

Click on the green button that says “Use this template” as shown in the above image.


This will take you to a new page that looks like this:

Here you will:

  1. Provide the name for your repository that you are about to create - “COVID_Pipeline” would work
  2. Decide if you want your repository to be Public or Private
  3. Press the green “Create repository from template” button

Great! Now you have a repository on GitHub which contains all the current COVIDScenarioPipeline files and code.

It should look something like this:

Leave this open! You will want this for the next step!


Get the pipeline files onto your computer

  1. Press the green “Clone or download” button in your github repository that you just created

  1. This will bring up a small window. Press the small botton with an icon that looks like a clipboard. This will copy the location of your repository on GitHub.

  1. Open a new project in RStudio (if this is new for you see New to R or RStudio)

  2. Select the Terminal tab in Rstudio

  1. Type the following words in the Terminal (but do not press enter yet):

git clone

  1. Paste what is on your clipboard by either using keyboard shortcuts or edit –> paste in RStudio

Should look something like this after the dollar sign $:

git clone https://github.com/yourgithubusername/COVID_Pipeline.git

Where your github username is shown in between “github.com” and the name of the repository you created. Make sure you replace this!

  1. Press enter

you should see some messages like:

Cloning into 'COVID_Pipeline'...

Once it is complete you will see that you now have a directory named the same as your GitHub repository that contains all the files in the repository.

  1. Now go inside the repo by typing:

cd COVID_Pipeline and press enter

  1. Now we will pull files from another github repo by typing the following command and pressing enter:

git clone https://github.com/HopkinsIDD/COVIDScenarioPipeline.git

You should get some output that looks something like this:

You will also now have a directory called “COVIDScenarioPipeline”.

  1. We also need to get large files from this repository

To do this we will need git large file storage also called git-lfs.

So go to this link and download git-lfs.

Open up a new terminal window (!!but keep the other one open!!). To do this in R studio you can press on the downward arrowhead next to where it says “Terminal1”, like this:

Click on New Terminal.

For more information about Terminals in R studio see here.

In this new terminal:

On mac:

If you don’t already have homebrew do the following:

  1. Go to your /users/local directory by typing this command in a new terminal window:

cd /users/local and press enter

  1. Then type this command:

mkdir homebrew && curl -L https://github.com/Homebrew/brew/tarball/master | tar xz --strip 1 -C homebrew and press enter

Then type these commands:

  1. brew install git-lfs and enter

  2. git lfs install and enter

should say git lfs initialized

Go back to Terminal1 by clicking on the downward arrow next to Terminal2 and clicking on Terminal1:

Make sure you are still in the COVIDScenarioPipeline directory (the repository directory that you created with your second git pull command) by typing this command:

cd /home/app/covidsp/COVIDScenarioPipeline/ and enter

Type this command and enter:

git lfs pull

Great now we have the files we need on our local computer!


Accessing the required R and Python tools on your computer

To get the exact required versions of the R packages and Python packages, modules, and scripts, we can simply use something called Docker.

If you are new to Docker and need to set up an account go to the New to Docker section of the tutorial.

Once you are set up with Docker Desktop and Docker Hub you can proceed with the tutorial.

You can use the RStudio terminal for the next docker commands or any terminal that you perfer.

For the docker commands in this section, if you run into permissions problems, you will need to put sudo in front of the command.

  1. First, we will pull the docker image from hub.docker.com (You’ll only have to do this the first time).

Type the following command into the Terminal 1 tab of RStudio and press enter.

docker pull hopkinsidd/covidscenariopipeline:latest

You will see something like this:

Note: This will take some time (possibly an hour or more)!

You will know it is finished when it stops printing output and the $ is back!

You should get a message that looks something like this:

If that did not happen, Docker suggests this:

Depending on how you’ve installed docker on your system, you might see a permission denied error after running the above command. If you’re on a Mac, make sure the Docker engine is running. If you’re on Linux, then prefix your docker commands with sudo. Alternatively, you can create a docker group to get rid of this issue.

What did we just do exactly?

The pull command caused Docker to grab the latest version of the hopkinsidd/covidscenariopipeline image and put it on your local machine.

If you type Docker images you will now see hello-world and hopkinsidd/covidscenariopipeline listed as repositories on your computer.

Now when we run the hopkinsidd/covidscenariopipeline image Docker, we can run commands in the hopkinsidd/covidscenariopipeline container. This is similar to running a command in a virtual machine but doesnt require booting up a virtual machine.

Here are the defintions of the various Docker terms according to Docker:

Images - The blueprints of our application which form the basis of containers. In the demo above, we used the docker pull command to download the busybox image. Containers - Created from Docker images and run the actual application. We create a container using docker run which we did using the busybox image that we downloaded. A list of running containers can be seen using the docker ps command. Docker Daemon - The background service running on the host that manages building, running and distributing Docker containers. The daemon is the process that runs in the operating system which clients talk to. Docker Client - The command line tool that allows the user to interact with the daemon. More generally, there can be other forms of clients too - such as Kitematic which provide a GUI to the users. Docker Hub - A registry of Docker images. You can think of the registry as a directory of all available Docker images. If required, one can host their own Docker registries and can use them for pulling images.

  1. Now you will run the docker container with your current directory mounted as /home/app/covidsp/ by typing in one of the following commands (depening on your operating system):

On Linux or Mac:

docker run -it --rm -v "$(pwd)":/home/app/covidsp hopkinsidd/covidscenariopipeline

On Windows:

docker run -it --rm -v %CD%:/home/app/covidsp hopkinsidd/covidscenariopipeline

the -it flag creates an interactive tty to allow us to run commands in the container.

Great! now you are inside the docker container you can take a look around the files located here by typing ls.

you will see something like this:

You might also notice that the information to the left of the $ has changed as you are now in the container

docker ps shows you the containers that are running

docker ps -a shows you containers that were run in the past and currently running containers

You are running the container from the /home/app directory.

  1. Now, the Docker container needs some local R packages installed. We can do that by typing the following command (followed by enter):

Rscript local_install.R

If there’s a prompt enter one or more numbers, or an empty line to skip updates:, just hit .

You will see lots of output printed to the screen.

  1. We also need to mount the COVIDScenarioPipeline files in the Docker. To do this we will need Terminal2 again. (If you closed it earlier, no worries, just create another new terminal)

Then run the following command on Linux or Mac:

docker run -it --rm -v "$(pwd)":/home/app/covidsp -v "COVIDScenarioPipeline":/home/app/covidsp/COVIDScenarioPipeline hopkinsidd/covidscenariopipeline

Or run the following command on Windows: (need to check this one… maybe no quotes???)

docker run -it --rm -v %CD%::/home/app/covidsp -v "COVIDScenarioPipeline":/home/app/covidsp/COVIDScenarioPipeline hopkinsidd/covidscenariopipeline

  1. We also need some R packages installed here: (THINK WE NEED THIS BUT NOT SURE) Rscript COVIDScenarioPipeline/local_install.R

Generating Data


Generate geodata.csv and mobility.csv

  1. go to the terminal1 tab - or the terminal where you were first running DOCKER
  2. go to covidsp directory In the terminal type: cd /home/app/covidsp/ and press enter
  3. Use an Rscript to create the files: In the terminal type the following command and press enter:

Rscript /home/app/covidsp/COVIDScenarioPipeline/R/scripts/build_US_setup.R -c config.yml -p /home/app/covidsp/COVIDScenarioPipeline

  1. go to the data directory In the terminal type: cd /home/app/covidsp/data/ and press enter Then type: ls -l and press enter.

You should see some files named: mobility.csv and geodata.csv that were created today.


Generate Shapefiles

To do this you will need a key to gain access to an API about the census data.

To gain access go to this link.

After you fill out the information and press the “Submit Key Request” button, you will receive a message like this:

Check your email for an email to acctivate your API key.

Once complete you will be taken to a page that says this: Note: if this doesnt work the first time request a new key

Now that you have the API key, you need to update the config.yml file with your key.

To do this we you can either use your favorite editor like vim, or you can simply open the config.yml file in RStudio.

This will allow you to easily modify and update the config file with your key.

This will open up the file in an editor in RStudio which will allow you to copy paste your API key.

Note: Make sure you copy paste your API key before the comment (# For use with the tidycensus package. ) or replace the comment like below:

Now to create our shape files we will run the following commands:

R
config <- covidcommon::load_config("config.yml") tidycensus::census_api_key(key = config$importation$census_api_key)
covidImportation::get_county_pops(c('HI'), 'HI')

In this example we are running the pipeline for Hawaii. If you wanted to run the pipeline for a different state you would replace the state abbreviation.

After running these commands you will get some output like this: (don’t worry if you see some warnings about dplyr)

To exit R we need to run the following command:

q()

Now if we go to our data directory we will see new files!

cd /home/app/covidsp/data/
ls to view the files

We now see a new county_pops_2010.csv file and a shp directory. (shp only shows up if we do more changes to the config!!)

Inside the shp directory (see inside with cd shp, followed by ls) you will see several new files:

Edit the rest of the Config file

Build and Run

  1. cd /home/app/covidsp/
  2. Rscript COVIDScenarioPipeline/R/scripts/make_makefile.R -c config.yml
  3. mkdir notebooks
  4. cd notebooks
  5. mkdir HI_today
  6. cd /home/app/covidsp/
  7. Rscript -e 'rmarkdown::draft("notebooks/HI_today/HI_report.Rmd",template="state_report",package="report.generation",edit=FALSE)' 8.cd notebooks/HI_today/
  8. ls 10.cd /home/app/covidsp/
  9. echo 'rmarkdown::render("notebooks/HI_today/HI_report.Rmd", params=list(state_usps="HI"))' >compile_Rmd.R
  10. make

If you want to rerun

1.cd /home/app/covidsp/
2. ls -l
3. mv .files .oldfiles
4. make

(hmmm may need to remove other old files… like shp files etc…)

New to R or RStudio


Dowload and install R and RStudio

If you are new to R or RStudio, dont worry! You can follow these simple steps to get started.

You will need to download install RStudio (and possibly R if you do not already have it installed).

To do so follow this tutorial.

Create an RStudio project

  1. Go to File –> New Project

  1. Choose the directory for your covid project - likely you would want “New Directory”

  1. Select “New Project” as the Project Type Note: you may not see all of the same options as shown here

4) If you selected a new directory, than designate the name of that new directory and double check that it’s location is somewhere on your computer that you would want. Perhaps COVID_Pipeline would be a good name. We will use this in our examples.

Great! Now you are ready to start using RStudio for the COVIDScenarioPipeline. Return to the Getting Started section of the tutorial.

New to Docker


What is Docker?

Docker allows people to have the same software and all of the required dependendencies easily. It is similar to a virtual machine, which allows you to run an instance of a particular operating system with the particular software. However, Docker uses your own operating system, so it doesnt require as much overhead.

Check out this guide for more information.

Why do I need an account?

Create a Docker Account and Download Docker Desktop:

  1. Click this link

You will see a page that looks something like this:
2) Fill out a Docker ID (any name that works for you), email, and password - click that you are not a robot
3) Click the blue “Sign Up” button
4) You will be taken to a new window - Select the free Community Docker Plan 5) Verify your account through your email
6) This will take you to a new window that looks like this:

Click on “Get started with Docker Desktop”

  1. This will take you to a window with an image like this which should have a button below for downloading Docker Desktop on your computer:

Note: This may take some time (possibly more than hour)!

Installing Docker Desktop

  1. To install Docker follow the instructions for either:
  1. Once installed the directions should have gotten you to the point where you can run docker run hello-world from the terminal tab in RStudio. (by typing it in and pressing enter)

This should give some ouput that starts like this:

Great! Now you are ready to return to the Accessing the required R and Python tools on your computer section of the tutorial (just click the name of the section here to return to it).